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Abstract 

In  this  paper,  we  explore  the  use  of  automatic 
syntactic  simplification  for  improving  content 
selection  in  multi-document  summarization.  In 
particular,  we  show  how  simplifying  parenthet¬ 
ical  by  removing  relative  clauses  and  apposi- 
tives  results  in  improved  sentence  clustering,  by 
forcing  clustering  based  on  central  rather  than 
background  information.  We  argue  that  the  in¬ 
clusion  of  parenthetical  information  in  a  sum¬ 
mary  is  a  reference-generation  task  rather  than  a 
content-selection  one,  and  implement  a  baseline 
reference  rewriting  module.  We  perform  our 
evaluations  on  the  test  sets  from  the  2003  and 
2004  Document  Understanding  Conference  and 
report  that  simplifying  parentheticals  results  in 
significant  improvement  on  the  automated  eval¬ 
uation  metric  Rouge. 

1  Introduction 

Syntactic  simplification  is  an  NLP  task,  the  goal  of 
which  is  to  rewrite  sentences  to  reduce  their  gram¬ 
matical  complexity  while  preserving  their  meaning 
and  information  content.  Text  simplification  is  a 
useful  task  for  varied  reasons.  Chandrasekar  et  al. 
(1996)  viewed  text  simplification  as  a  preprocess¬ 
ing  tool  to  improve  the  performance  of  their  parser. 
The  PSET  project  (Carroll  et  ah,  1999),  on  the  other 
hand,  focused  its  research  on  simplifying  newspaper 
text  for  aphasics,  who  have  trouble  with  long  sen¬ 
tences  and  complicated  grammatical  constructs.  We 
have  previously  (Siddharthan,  2002;  Siddharthan, 
2003)  developed  a  shallow  and  robust  syntactic  sim¬ 
plification  system  for  news  reports,  that  simplifies 
relative  clauses,  apposition  and  conjuncfion.  In  fhis 
paper,  we  explore  fhe  use  of  synfacfic  simplification 
in  mulfi-documenf  summarizafion. 

1.1  Sentence  Shortening  for  Summarization 

It  is  interesting  to  survey  the  literature  in  sentence 
shortening,  a  task  related  to  syntactic  simplification. 
Grefenstette  (1998)  proposed  the  use  of  sentence 
shortening  to  generate  telegraphic  texts  that  would 
help  a  blind  reader  (with  a  text-to-speech  software) 
skim  a  page  in  a  manner  similar  to  sighted  readers. 
He  provided  eight  levels  of  telegraphic  reduction. 


The  first  (the  most  drastic)  generated  a  stream  of 
all  the  proper  nouns  in  the  text.  The  second  gen¬ 
erated  all  nouns  in  subject  or  object  position.  The 
third,  in  addition,  included  the  head  verbs.  The  least 
drastic  reduction  generated  all  subjects,  head  verbs, 
objects,  subclauses  and  prepositions  and  dependent 
noun  heads.  Reproducing  from  an  example  in  his 
paper,  the  sentence: 

Former  Democratic  National  Committee  fi¬ 
nance  director  Richard  Sullivan  faced  more 
pointed  questioning  from  Republicans  during 
his  second  day  on  the  witness  stand  in  the 
Senate’s  fund-raising  investigation. 

got  shortened  (with  different  levels  of  reduction)  to: 

•  Richard  Sullivan  Republicans  Senate. 

•  Richard  Sullivan  faced  pointed  questioning. 

•  Richard  Sullivan  faced  pointed  questioning  from 
Republicans  during  day  on  stand  in  Senate  fund¬ 
raising  investigation. 

Grefenstette  (1998)  provided  a  rule  based  ap¬ 
proach  to  telegraphic  reduction  of  the  kind  illus¬ 
trated  above.  Since  then,  Jing  (2000),  Riezler  et 
al.  (2003)  and  Knight  and  Marcu  (2000)  have  ex¬ 
plored  statistical  models  for  sentence  shortening 
that,  in  addition,  aim  at  ensuring  grammaticality  of 
the  shortened  sentences. 

These  sentence-shortening  approaches  have  been 
evaluated  by  comparison  with  human-shortened 
sentences  and  have  been  shown  to  compare  fa¬ 
vorably.  However,  the  use  of  sentence  shorten¬ 
ing  for  the  multi-document  summarization  task  has 
been  largely  unexplored,  even  though  intuitively  it 
appears  that  sentence-shortening  can  allow  more 
important  information  to  be  included  in  a  sum¬ 
mary.  Recently,  Lin  (2003)  showed  that  statisti¬ 
cal  sentence-shortening  approaches  like  Knight  and 
Marcu  (2000)  do  not  improve  content  selection  in 
summaries.  Indeed  he  reported  that  syntax-based 
sentence-shortening  resulted  in  significantly  worse 
content  selection  by  their  extractive  summarizer 
NeATS.  Lin  (2003)  concluded  that  pure  syntax- 
based  compression  does  not  improve  overall  sum¬ 
marizer  performance,  even  though  the  compression 
algorithm  performs  well  at  the  sentence  level. 
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1.2  Simplifying  Syntax  for  Summarization 

A  problem  with  using  statistical  sentence¬ 
shortening  for  summarization  is  that  syntactic 
form  does  not  always  correlate  with  the  importance 
of  the  information  contained  within.  As  a  result, 
syntactic  sentence  shortening  might  get  rid  of  im¬ 
portant  information  that  should  be  included  in  the 
summary.  In  contrast,  the  syntactic  simplification 
literature  deals  with  syntactic  constructs  that  can 
be  interpreted  from  a  rhetorical  perspective.  In 
particular,  appositives  and  non-restrictive  relative 
clauses  are  considered  parentheticals  in  RST 
(Mann  and  Thompson,  1988).  Their  role  is  to 
provide  background  information  on  entities,  and 
to  relate  the  entity  to  the  discourse.  Along  with 
restrictive  relative  clauses,  their  inclusion  in  a  sum¬ 
mary  should  ideally  be  determined  by  a  reference 
generating  module,  not  a  content  selector.  It  is  thus 
more  likely  that  the  removal  of  appositives  and 
relative  clauses  will  impact  content-selection  than 
the  removal  of  adjectives  and  prepositional  phrases, 
as  attempted  by  sentence  shortening.  It  is  precisely 
this  hypothesis  that  we  explore  in  this  paper. 

1.3  Outline 

We  describe  our  sentence-clustering  based  summa- 
rizer  in  the  next  section,  including  our  experiments 
on  using  simplification  of  parentheticals  to  improve 
clustering  in  §2.1.  We  evaluate  our  summarizer  in 
§3  and  then  describe  our  reference  regenerator  in  §4. 
We  present  a  discussion  of  our  approach  in  §5  and 
conclude  in  §6. 

2  The  Summarizer 

We  use  a  sentence-clustering  approach  to  multi¬ 
document  summarization  (similar  to  multigen 
(Barzilay,  2003)),  where  sentences  in  the  input  doc¬ 
uments  are  clustered  according  to  their  similarity. 
Larger  clusters  represent  information  that  is  re¬ 
peated  more  often  across  input  documents;  hence 
the  size  of  a  cluster  is  indicative  of  the  importance  of 
that  information.  For  our  current  implementation,  a 
representative  (simplified)  senfence  is  selecfed  from 
each  clusfer  and  fhese  are  incorporafed  info  fhe  sum¬ 
mary  in  fhe  order  of  decreasing  clusfer  size. 

A  problem  wifh  fhis  approach  is  fhaf  fhe  clusfer- 
ing  is  nof  always  accurafe.  Clusters  can  confain  spu¬ 
rious  sentences,  and  a  cluster’s  size  mighf  fhen  ex- 
aggerafe  ifs  imporfance.  Improving  fhe  qualify  of 
fhe  clusfering  can  fhus  be  expected  fo  improve  fhe 
confenf  of  fhe  summary.  We  now  describe  our  ex- 
perimenfs  on  synfacfic  simplificafion  and  senfence 
clusfering.  Our  hypofhesis  is  fhaf  simplifying  par- 
enfhelical  unifs  (relative  clauses  and  appositives) 


will  improve  fhe  performance  of  our  clustering  al- 
gorifhm,  by  prevenfing  if  from  clustering  on  fhe  ba¬ 
sis  of  background  informalion. 

2.1  Simplification  and  Clustering 

We  use  SimFinder  (Hafzivassiloglou  ef  al.,  1999) 
for  sentence  clustering  and  its  similarity  metric  to 
evaluate  cluster  quality;  SimFinder  outputs  similar¬ 
ity  values  (simvals)  between  0  and  1  for  pairs  of 
sentences,  based  on  word  overlap,  synonymy  and 
n-gram  matches.  We  use  the  average  of  the  sim¬ 
vals  for  each  pair  of  sentences  in  a  cluster  to  eval¬ 
uate  a  quality-score  for  the  cluster.  Table  1  below 
shows  the  quality-scores  averaged  over  all  clusters 
when  the  original  document  set  is  and  is  not  prepro¬ 
cessed  using  our  syntactic  simplification  software 
(described  in  §2.2).  We  use  30  document  sets  from 
the  2003  Document  Understanding  Conference  (see 
§3. 1  for  description).  For  each  of  the  experiments  in 
table  1,  SimFinder  produced  around  1500  clusters, 
with  an  average  cluster  size  beween  3.6  and  3.8. 


Orig 

Simp-Paren 

Simp-Conj 

Av.  quality-score 
Std.  deviation  (ct) 

0.687 

0.130 

0.722 

0.112 

0.686 

0.126 

Table  1 :  Syntactic  Simplification  and  Clustering 


Table  1  shows  that  removing  parentheticals  re¬ 
sults  in  a  5%  relative  improvement  in  clustering. 
This  improvement  is  significant  at  confidence  t  = 
95%  as  determined  by  the  difference  in  proportions 
test  (Snedecor  and  Cochran,  1989).  Further,  the 
standard  deviation  for  the  performance  of  the  clus¬ 
tering  decreases  by  around  2%.  This  suggests  that 
removing  parentheticals  results  in  better  and  more 
robust  clustering.  As  an  example  of  how  clustering 
improves,  our  simplification  routine  simplifies: 

PAL,  which  has  been  unable  to  make  pay¬ 
ments  on  dlrs  2.1  billion  in  debt,  was  dev¬ 
astated  by  a  pilots’  strike  in  June  and  by  the 
region’s  currency  crisis,  which  reduced  pas¬ 
senger  numbers  and  inflated  costs, 
to: 

PAL  was  devastated  by  a  pilots’  strike  in  June 
and  by  the  region’s  currency  crisis. 

Three  other  sentences  also  simplify  to  the  extent  that 
they  represent  PAL  being  hit  by  the  June  strike.  The 
resulting  cluster  (with  quality  score=0.94)  is: 

1.  PAL  was  devastated  by  a  pilots’  strike  in  June  and 
by  the  region’s  currency  crisis. 

2.  In  June,  PAL  was  embroiled  in  a  crippling  three- 
week  pilots’  strike. 

3.  Tan  wants  to  retain  the  200  pilots  because  they 
stood  by  him  when  the  majority  of  PAL’s  pilots 
staged  a  devastating  strike  in  June. 


4.  In  June,  PAL  was  embroiled  in  a  crippling  three- 
week  pilots’  strike. 

On  the  other  hand,  splitting  eonjoined  elauses 
does  not  appear  to  aid  elustering^.  This  indieates 
that  the  improvement  from  removing  parenthetieals 
is  not  beeause  shorter  sentenees  might  eluster  bet¬ 
ter  (as  SimFinder  eontrols  for  sentenee  length,  this 
is  anyway  unlikely).  For  eonfirmation,  we  per¬ 
formed  one  more  experiment — we  deleted  words 
at  random,  so  that  the  average  sentenee  length  for 
the  modified  input  doeuments  was  the  same  as  for 
the  inputs  with  parenthetieals  removed.  This  aetu- 
ally  made  the  elustering  worse  (av.  quality  seore  of 
0.637),  eonfirming  that  the  improvement  from  re¬ 
moving  parenthetieals  was  not  due  to  redueed  sen¬ 
tenee  length.  These  results  demonstrate  that  the  par- 
enthetieal  nature  of  relative  elauses  and  appositives 
makes  their  removal  useful. 

Improved  elustering,  however,  need  not  neeessar- 
ily  translate  to  improved  eontent  seleetion  in  sum¬ 
maries.  We  therefore  also  need  to  evaluate  our  sum¬ 
marizes  We  do  this  in  §3,  but  first  we  deseribe  the 
summarizer  in  more  detail. 

2.2  Description  of  our  Summarizer 

Our  summarizer  has  four  stages — preproeessing  of 
original  doeuments  to  remove  parenthetieals,  elus¬ 
tering  of  the  simplified  senfenees,  seleefing  of  one 
represenfafive  senfenee  from  eaeh  elusfer  and  deeid- 
ing  whieh  of  fhese  seleefed  senfenees  fo  ineorporafe 
in  fhe  summary. 

We  use  our  synfaefie  simpliheafion  soffware  (Sid- 
dharfhan,  2002;  Siddharfhan,  2003)  fo  remove  par- 
enfhefieals.  If  uses  fhe  LT  TTT  (Grover  ef  ah,  2000) 
for  POS-fagging  and  simple  noun-ehunking.  If  fhen 
performs  apposition  and  relafive  elause  idenfifiea- 
fion  and  affaehmenf  using  shallow  feehniques  based 
on  loeal  eonfexf  and  animaey  informalion  obfained 
from  WordNef  (Miller  el  ah,  1993). 

We  fhen  elusfer  fhe  simplified  senfenees  wilh 
SimFinder  (Halzivassiloglou  el  ah,  1999).  To  fur- 
Iher  lighlen  fhe  eluslers  and  ensure  lhal  Iheir  size  is 
represenfafive  of  Iheir  imporfanee,  we  posl-proeess 
Ihem  as  follows.  SimFinder  implemenls  an  inere- 
menlal  approaeh  fo  elustering.  Al  eaeh  ineremenlal 
step,  fhe  similarily  of  a  new  senfenee  fo  an  existing 
eluster  is  eompuled.  If  Ihis  is  higher  lhan  a  Ihresh- 
old,  fhe  sentenee  is  added  fo  fhe  elusfer.  There  is  no 
baeklraeking;  onee  a  senfenee  is  added  fo  a  elusfer, 
if  eannol  be  removed,  even  if  if  is  dissimilar  fo  all  fhe 

*In  this  example,  splitting  subordination  helps  as  sentence 
3  yields  the  majority  of  PAL’s  pilots  staged  a  devastating  strike 
in  June.  However,  averaged  over  the  entire  DUG ’03  data  set, 
there  is  no  net  improvement  from  splitting  conjunction. 


senfenees  added  fo  fhe  eluster  in  fhe  fulure.  Henee, 
Ihere  are  oflen  one  or  Iwo  senfenees  lhal  have  low 
similarily  wilh  fhe  final  elusfer.  We  remove  fhese 
wilh  a  posl-proeess  lhal  ean  be  eonsidered  equiva- 
lenl  fo  a  baek-lraeking  slep.  We  redefine  fhe  erileria 
for  a  senfenee  fo  be  pari  of  fhe  final  elusfer  sueh  lhal 
if  has  fo  be  similar  (simval  above  fhe  Ihreshold)  fo 
all  olher  senfenees  in  fhe  final  elusfer.  We  prune 
fhe  eluster  fo  remove  senfenees  lhal  do  nol  satisfy 
Ibis  erilerion.  Consider  fhe  following  eluster  and  a 
Ihreshold  of  0.65.  Eaeh  line  eonsisls  of  Iwo  senfenee 
ids  [PfsentJd])  and  Iheir  simval. 


P37 

P69 

0.9999999999964279 

P37 

P160 

0.8120098824183786 

P37 

P161 

0.8910485867563762 

P37 

P176 

0.8971370325713883 

P69 

P160 

0.8120098824183786 

P69 

P161 

0.8910485867563762 

P69 

P176 

0.8971370325713883 

P160 

P161 

0.2333051325617611 

P160 

P176 

0.0447901658343020 

P161 

P176 

0.7517636285580539 

We  mark  all  fhe  lines  wilh  similarily  values  below 
fhe  Ihreshold  (in  bold  fonl).  We  fhen  remove  as  few 
sentenees  as  possible  sueh  lhal  fhese  lines  are  ex- 
eluded.  In  Ihis  example,  if  is  suffieienl  fo  remove 
P160.  The  final  elusfer  is  fhen: 


P37 

P69 

0.9999999999964279 

P37 

P161 

0.8910485867563762 

P37 

P176 

0.8971370325713883 

P69 

P161 

0.8910485867563762 

P69 

P176 

0.8971370325713883 

P161 

P176 

0.7517636285580539 

The  result  is 

a  mueh  tighter  eluster  with  one  sen- 

lenee  less  lhan  fhe  original.  This  pruning  operalion 
leads  fo  even  higher  similarily  seores  lhan  Ihose  pre¬ 
sented  in  fable  1 . 

Having  pruned  fhe  eluslers,  we  seleel  a  represen- 
lafive  sentenee  from  eaeh  elusfer  based  on  tf*idf. 
We  fhen  ineorporafe  fhese  represenfafive  sentenees 
info  fhe  summary  in  deereasing  order  of  Iheir  elusfer 
size.  For  eluslers  wilh  fhe  same  size,  we  ineorpo¬ 
rafe  senfenees  in  deereasing  order  of  tf*idf.  Unlike 
multigen  (Barzilay,  2003),  whieh  is  generative  and 
eonslruels  a  senfenee  from  eaeh  eluster  using  infor¬ 
malion  fusion,  we  implemenl  exfraelive  summariza¬ 
tion  and  seleel  one  (simplified)  sentenee  from  eaeh 
elusfer.  We  diseuss  fhe  seope  for  generation  in  our 
summarizer  in  §4  and  §6. 

3  Evaluation 

We  present  two  evaluations  in  this  seetion.  Our 
system,  as  deseribed  in  the  previous  seetion,  was 
entered  for  the  DUG ’04  eompetition.  We  deseribe 
how  it  fared  in  §3.3.  We  also  present  an  evaluation 
over  a  larger  data  set  to  show  that  syntaetie  simplifi- 
eation  of  parenthetieal  units  signifieantly  improves 


content  selection  (§3.4).  But  first,  we  describe  our 
data  (§3.1)  and  the  evaluation  metric  Rouge  (§3.2). 

3.1  Data 

The  Document  Understanding  Conference  (DUC) 
has  been  run  annually  since  2001  and  is  the  biggest 
summarization  evaluation  effort,  with  participants 
from  all  over  the  world.  In  2003,  DUC  put  spe¬ 
cial  emphasis  on  the  development  of  automatic  eval¬ 
uation  methods  and  also  started  providing  partici¬ 
pants  with  multiple  human-written  models  needed 
for  reliable  evaluation.  Participating  generic  multi¬ 
document  summarizers  were  tested  on  30  event- 
based  sets  in  2003  and  50  sets  in  2004,  all  80  con¬ 
taining  roughly  10  newswire  articles  each.  There 
were  four  human-written  summaries  for  each  set, 
created  for  evaluation  purposes.  In  DUC’03,  the 
task  was  to  generate  100  word  summaries,  while  in 
DUC’04,  the  limit  was  changed  to  665  bytes. 

3.2  Evaluation  Metric 

We  evaluated  our  summarizer  on  the  DUC  test  sets 
using  the  Rouge  automatic  scoring  metric  (Lin  and 
Hovy,  2003).  The  experiments  in  Lin  and  Hovy 
(2003)  show  that  among  n-gram  approaches  to  scor¬ 
ing,  Rouge-1  (based  on  unigrams)  has  the  highest 
correlation  with  human  scores.  In  2004,  an  addi¬ 
tional  automatic  metric  based  on  longest  common 
subsequence  was  included  (Rouge-L),  that  aims  to 
overcome  some  deficiencies  of  Rouge- 1,  such  as 
ifs  suscepfibilify  fo  ungrammatical  keyword  pack¬ 
ing  by  dishonesf  summarizers^.  For  our  evalua¬ 
tions,  we  use  fhe  Rouge  sellings  from  DUC’04:  slop 
words  are  included,  words  are  Porler-slemmed,  and 
all  four  human  model  summaries  are  used. 

3.3  DUC’04  Evaluation 

We  entered  our  system  as  described  above  for  the 
DUC’04  competition.  There  were  35  entries  for  the 
generic  summary  task,  including  ours.  At  95%  con¬ 
fidence  levels,  our  system  was  significantly  superior 
to  23  systems  and  indistinguishable  from  the  other 
11  (using  Rouge-L).  Using  Rouge-1,  there  was  one 
system  that  was  significantly  superior  to  ours,  10 
that  were  indistinguishable  and  23  that  were  signif¬ 
icantly  inferior.  We  give  a  few  Rouge  scores  from 
DUC’04  in  figure  2  below  for  comparison  purposes. 
The  95%  confidence  intervals  for  our  summarizer 
are  -1-0.0123  {Rouge-1)  and  -1-0.0130  {Rouge-L). 

3.4  Benefits  from  Syntactic  Simplification 

Table  3  below  shows  the  Rouge-1  and  Rouge-L 
scores  for  our  summarizer  when  the  text  is  and  is 
not  simplified  to  remove  parentheticals.  The  data 

^More  detail  on  the  Rouge  evaluation  metrics  can  be  ob¬ 
tained  online  from  http://www.isi.edu/~cyl/papers/ROUGE- 
Working-Note-v  1.3.1  .pdf 


Summarizer 

Rouge- 1 

Rouge-L 

Our  Summarizer 

0.3672 

0.3804 

Best  Summarizer 

0.3822 

0.3895 

Median  Summarizer 

0.3429 

0.3538 

Worst  Summarizer 

0.2419 

0.2763 

Av.  of  Human  Summarizers 

0.4030 

0.4202 

Table  2:  Rouge  Scores  for  DUC’04  competition. 


for  this  evaluation  consists  of  the  80  document  sets 
from  DUC’03  and  DUC’04.  We  did  not  use  data 
from  previous  years  as  these  included  only  one  hu¬ 
man  model-summary  and  Rouge  requires  multiple 
models  to  be  reliable. 


Summarizer 

Rouge- 1 

Rouge-L 

With  simplification 
Without  simplification 

0.3608 

0.3398 

0.3839 

0.3643 

Table  3:  Rouge  Scores  for  DUC’03  and  ’04  data. 

The  improvement  in  performance  when  the  text 
is  preprocessed  to  remove  parenthetical  units  is  sig¬ 
nificant  at  95%  confidence  limits.  When  compared 
to  the  34  other  participants  of  DUC’04,  the  simpli¬ 
fication  step  raises  our  clustering-based  summarizer 
from  languishing  in  the  bottom  half  to  being  in  the 
top  third  and  statistically  indistinguishable  from  the 
top  system  at  95%  confidence  (using  Rouge-L). 

4  Reference  Regeneration 

As  the  evaluations  above  show,  preprocessing  text 
with  syntactic  simplification  significantly  improves 
content  selection  for  our  summarizer.  This  is  en¬ 
couraging;  however,  our  summarizer,  as  describe  so 
far,  generates  summaries  that  contain  no  parenthet¬ 
icals  (appositives  or  relative  clauses),  as  these  are 
removed  from  the  original  texts  prior  to  summariza¬ 
tion.  We  believe  that  the  inclusion  of  parentheti¬ 
cal  information  about  entities  should  be  treated  as 
a  reference  generation  task,  rather  than  a  content 
selection  one.  Our  analysis  of  human  summaries 
suggests  that  people  select  parentheticals  to  improve 
coherence  and  to  aid  the  hearer  in  identifying  refer¬ 
ents  and  relating  them  to  the  discourse.  A  complete 
treatment  of  parentheticals  in  reference  regeneration 
in  summaries  is  beyond  the  scope  of  this  paper,  the 
emphasis  of  which  is  content-selection,  rather  than 
coherence.  We  plan  to  address  this  issue  elsewhere; 
in  this  paper,  we  restrict  ourselves  to  describing  a 
baseline  approach  to  incorporating  parentheticals  in 
regenerated  references  to  people  in  summaries. 

4.1  Including  Parentheticals 

Our  text-simplification  system  (Siddharthan,  2003) 
provides  us  with  with  a  list  of  all  relative  clauses, 
appositives  and  pronouns  that  attach  to/co-refer 


with  every  entity.  We  used  a  named  entity  tag¬ 
ger  (Waeholder  et  ah,  1997)  to  eolleet  all  sueh  infor¬ 
mation  for  every  person.  The  proeessed  referenees 
to  the  same  people  aeross  doeuments  were  aligned 
using  the  named  entity  tagger  eanonie  name,  result¬ 
ing  in  tables  similar  to  those  shown  in  figure  1 . 

Abdullah  Ocalan 

APW19981106.il  19:  [IR]  Abdullah  Ocalan;  [AP] 
leader  of  the  outlawed  Kurdistan  Worker ’s  Party;  [CO] 
Ocalan; 

APW19981 104.0265:  [IR]  Kurdish  rebel  leader  Ab¬ 
dullah  Ocalan;  [RC]  who  is  wanted  in  Turkey  on 
charges  of  heading  a  terrorist  organization;  [CO] 
Ocalan;  [RC]  who  leads  the  banned  Kurdish  Workers 
Party  ,  or  PKK  ,  which  has  been  fighting  for  Kurdish 
autonomy  in  Turkey  since  1984;  [CO]  Ocalan;  [CO] 
Ocalan;  [CO]  Ocalan; 

APW19981113.0541:  [IR]  Abdullah  Ocalan;  [AP] 
leader  of  Kurdish  insurgents;  [RC  ]  who  has  been 
sought  for  years  by  Turkey;  [CO]  Ocalan;  [CO] 
Ocalan;  [CO]  Ocalan;  [PR]  He;  [CO]  Ocalan;  [CO] 
Ocalan;  [PR]  his;  [CO]  Ocalan;  [CO]  Ocalan;  [CO] 
Ocalan;  [PR]  his;  [CO]  Ocalan;  [CO]  Ocalan;  [AP] 
a  political  science  dropout  from  Ankara  university  in 
1978; 

APW19981021.0554:  [IR]  rebel  leader  Abdullah 
Ocalan;  [PR]  he;  [CO]  Ocalan; 

Figure  1 :  Example  informalion  eolleeled  for  enlilies 
in  Ihe  input  The  eanonie  form  of  Ihe  named  enlily 
is  shown  in  bold  and  Ihe  inpul  arliele  id  in  ilalie. 
IR  slands  for  “initial  referenee”,  CO  for  subsequenl 
noun  eo-referenee,  PR  for  pronoun  referenee,  AP 
for  apposition  and  RC  for  relative  elause. 

We  aulomalieally  posl-ediled  our  summaries  us¬ 
ing  a  modified  version  of  Ihe  module  deseribed  in 
Nenkova  and  MeKeown  (2003).  This  module  nor¬ 
malizes  referenees  lo  people  in  Ihe  summary,  by  in- 
Irodueing  Ihem  in  delail  when  Ihey  are  firsl  men¬ 
tioned  and  using  a  shorl  referenee  for  subsequenl 
mentions;  Ihese  operations  were  shown  lo  improve 
Ihe  readabilily  of  Ihe  resulting  summaries. 

Nenkova  and  MeKeown  (2003)  avoided  inelud¬ 
ing  parenlhelieals  due  lo  bolh  Ihe  unavailabilily  of 
fasl  and  reliable  idenlifiealion  and  allaehmenl  of  ap- 
posilives  and  relative  elauses,  and  Iheorelieal  issues 
relating  lo  Ihe  seleelion  of  Ihe  mosl  suilable  paren- 
Ihelieal  unil  in  Ihe  new  summary  eonlexl.  In  order 
lo  ensure  a  balaneed  inelusion  of  parenlhelieal  in¬ 
formation  in  our  summaries,  we  modified  Iheir  ini¬ 
tial  approaeh  lo  allow  for  ineluding  relative  elauses 
and  apposilives  in  initial  referenees. 

We  made  use  of  Iwo  empirieal  observations  made 
by  Nenkova  and  MeKeown  (2003)  based  on  hu¬ 


man  summaries:  a  firsl  mention  is  very  likely  lo 
be  modified  in  some  way  (probabilily  of  0.76),  and 
subsequenl  mentions  are  very  unlikely  lo  be  posl- 
modified  (probabilily  of  0.01-0.04).  We  Iherefore 
only  eonsidered  ineorporaling  parenlhelieals  in  firsl 
mentions.  We  eonslrueled  a  sel  eonsisling  of  appos- 
ilives  and  relative  elauses  from  initial  referenees  in 
Ihe  inpul  doeumenls  and  an  emply  siring  option  (for 
Ihe  example  in  figure  1,  Ihe  sel  would  be  {  “leader 
of  Ihe  oullawed  Kurdislan  Worker’s  Parly”,  “who  is 
wanted  in  Turkey  on  eharges  of  heading  a  lerrorisl 
organization”,*'  leader  of  Kurdish  insurgenls”,  “who 
has  been  soughl  for  years  by  Turkey”,  e}).  We  Ihen 
seleeled  one  member  of  Ihe  sel  randomly  for  inelu¬ 
sion  in  Ihe  initial  referenee.  A  more  sophisliealed 
approaeh  lo  Ihe  Irealmenl  of  parenlhelieals  in  ref¬ 
erenee  regeneration,  based  on  lexieal  eohesion  eon- 
slrainls,  is  eurrenlly  underway. 

4.2  Evaluation 

We  repeated  Ihe  evaluations  on  Ihe  80  doeumenl 
sels  from  DUC’03  and  DUC’04,  using  our  simplifi- 
ealion-i-elustering  based  summarizer  wilh  Ihe  refer¬ 
enee  regeneration  eomponenl  ineluded.  The  resulls 
are  shown  in  Ihe  fable  below.  Al  95%  eonfidenee, 
Ihe  differenee  in  performanee  is  nol  signifieanl. 


Summarizer 

Rouge- 1 

Rouge-L 

Without  reference  rewrite 
With  reference  rewrite 

0.3608 

0.3599 

0.3839 

0.3854 

Table  4:  Rouge  seores  for  DUC’03  and  ’04  dala. 


This  is  an  interesting  resull  beeause  il  suggesls 
lhal  rewriting  referenees  does  nol  adversely  affeel 
eonlenl  seleelion.  This  mighl  be  beeause  Ihe  exlra 
words  added  lo  initial  referenees  are  parlly  eom- 
pensaled  for  by  words  removed  from  subsequenl 
referenees.  In  any  ease,  Ihe  referenee  rewriting 
ean  signifieanlly  improve  readabilily,  as  shown  in 
Ihe  examples  in  figures  2  and  3.  We  are  also 
oplimislie  lhal  a  more  foeused  referenee  rewriting 
proeess  based  on  lexieal-eohesive  eonslrainls  and 
informalion-lheorelie  measures  ean  improve  Rouge 
eonlenl-evalualion  seores  as  well  as  summary  read¬ 
abilily. 

5  Surface  Analysis  of  Summaries 

Table  5  eompares  Ihe  average  senlenee  lenglhs  of 
our  summaries  (after  referenee  rewriting)  wilh  Ihose 
of  Ihe  original  news  reporls,  human  (model)  sum¬ 
maries  and  maehine  summaries  generated  by  Ihe 
parlieipaling  summarizers  al  DUC’03  and  ’04. 

These  figures  eonlirm  various  inluilions  aboul 
human  vs  maehine-generaled  summaries — maehine 
summaries  tend  lo  be  based  on  senlenee  exlraelion; 


Before: 

Pinochet  was  placed  under  arrest  in  London  Friday  by 
British  police  acting  on  a  warrant  issued  by  a  Span¬ 
ish  judge.  Pinochet  has  immunity  from  prosecution  in 
Chile  as  a  senator-for-life  under  a  new  constitution  that 
his  government  crafted.  Pinochet  was  detained  in  the 
London  clinic  while  recovering  from  back  surgery. 
After: 

Gen.  Augusto  Pinochet,  the  former  Chilean  dictator, 
was  placed  under  arrest  in  London  Friday  by  British 
police  acting  on  a  warrant  issued  by  a  Spanish  judge. 
Pinochet  has  immunity  from  prosecution  in  Chile  as  a 
senator-for-life  under  a  new  constitution  that  his  gov¬ 
ernment  crafted.  Pinochet  was  detained  in  the  London 
clinic  while  recovering  from  back  surgery. 

Figure  2:  First  three  sentenees  from  a  maehine  gen¬ 
erated  summary  before/after  referenee  regeneration. 

many  have  an  explieitly  eneoded  preferenee  for  long 
sentenees  (assumed  to  be  more  informative);  hu¬ 
mans  tend  to  seleet  information  at  a  sub-sentential 
level.  As  a  result,  human  summaries  eontain  on 
average  shorter  sentenees  than  the  original,  while 
maehine  summaries  eontain  on  average  longer  sen¬ 
tenees  than  the  original.  Interestingly,  our  sum- 
marizer,  like  human  summarizers,  generates  shorter 
sentenees  than  the  original  news  text. 


News 

Reports 

Human 

Summaries 

Other  Machine 
Summaries 

Our 

Summaries 

21.43 

17.43 

28.75 

19.16 

Table  5:  Av.  sentenee  lengths  in  80  doeument  sets 
from  DUG ’03  and  ’04. 


Equally  interesting  is  the  distribution  of  paren- 
thetieals.  The  original  news  reports  eontain  on  av¬ 
erage  one  parenthetieal  unit  (appositive  or  relative 
elause)  every  3.9  sentenees.  The  maehine  sum¬ 
maries  eontain  on  average  one  parenthetieal  every 
3.3  sentenees.  On  the  other  hand,  human  summaries 
eontain  only  one  parenthetieal  unit  per  8.9  sentenees 
on  average. 

In  other  words,  human  summaries  eontain  fewer 
parenthetieal  units  per  sentenee  than  the  original  re¬ 
ports;  this  appears  to  be  a  deliberate  attempt  at  in¬ 
eluding  more  events  and  less  baekground  informa¬ 
tion  in  a  summary.  Maehine  summaries  tend  to  eon¬ 
tain  on  average  more  parenthetieals  than  the  original 
reports.  This  is  possibly  an  artifaet  of  the  preferenee 
for  longer  sentenees,  but  the  data  suggests  that  100 
word  maehine  summaries  use  up  valuable  spaee  by 
presenting  unneeessary  baekground  information. 

Our  summaries  eontain  one  parenthetieal  unit  ev¬ 
ery  10.0  sentenees.  This  is  eloser  to  human  sum¬ 
maries  than  to  the  average  maehine  summary,  again 
suggesting  that  our  approaeh  of  treating  the  inelu- 


Before: 

Turkey  has  been  trying  to  form  a  new  government 
since  a  coalition  government  led  by  Yilmaz  collapsed 
last  month  over  allegations  that  he  rigged  the  sale  of 
a  bank.  Ecevit  refused  even  to  consult  with  the  leader 
of  the  Virtue  Party  during  his  efforts  to  form  a  gov¬ 
ernment.  Ecevit  must  now  try  to  build  a  government. 
Demirel  consulted  Turkey’s  party  leaders  immediately 
after  Ecevit  gave  up. 

After: 

Turkey  has  been  trying  to  form  a  new  government 
since  a  coalition  government  led  by  Prime  Minister 
Mesut  Yilmaz  collapsed  last  month  over  allegations 
that  he  rigged  the  sale  of  a  bank.  Premier-designate 
Bulent  Ecevit  refused  even  to  consult  with  the  leader 
of  the  Virtue  Party  during  his  efforts  to  form  a  gov¬ 
ernment.  Ecevit  must  now  try  to  build  a  government. 
President  Suleyman  Demirel  consulted  Turkey’s  party 
leaders  immediately  after  Ecevit  gave  up. 

Figure  3:  First  four  sentences  from  another  machine 
summary  before/after  reference  regeneration. 

sion  of  parenthetieals  as  a  reference  generation  task 
is  justified. 

6  Conclusions  and  Future  Work 

We  have  demonstrated  that  simplifying  news  re¬ 
ports  by  removing  parenthetical  information  results 
in  better  sentence  clustering  and  consequently  bet¬ 
ter  summarization.  We  have  further  demonstrated 
that  using  a  reference  rewriting  module  to  intro¬ 
duce  parenthetieals  as  a  post-process  does  not  sig¬ 
nificantly  affect  the  score  on  an  automated  content- 
evaluation  metric;  indeed  we  believe  that  a  more  so¬ 
phisticated  rewriting  module  might  indeed  improve 
performance  on  content  selection.  In  addition,  the 
summaries  produced  by  our  summarizer  closely  re¬ 
semble  human  summaries  in  surface  features  such 
as  average  sentence  length  and  the  distribution  of 
relative  clauses  and  appositives. 

The  results  in  this  paper  might  be  useful  to  gener¬ 
ative  approaches  to  summarization.  It  is  likely  that 
the  improved  clustering  will  make  operations  like 
information  fusion  (Barzilay,  2003;  Dalianis  and 
Hovy,  1996)  within  clusters  more  reliable.  We  plan 
to  examine  whether  this  is  indeed  the  case. 

We  feel  that  the  performance  of  our  summarizer 
is  encouraging  (it  performs  at  90%  of  human  perfor¬ 
mance  as  measured  by  Rouge)  as  it  is  conceptually 
very  simple — it  selects  informative  sentences  from 
the  largest  clusters  and  does  not  contain  any  theo¬ 
retically  inelegant  optimizations,  such  as  excluding 
overly  long  or  short  sentences. 

Our  approach  of  extracting  out  parenthetieals  as 
a  pre-process  also  provides  a  framework  for  refer¬ 
ence  rewriting,  by  allowing  the  summarizer  to  select 


background  information  independently  of  the  main 
content.  We  believe  that  there  is  a  lot  of  research  left 
to  be  carried  out  in  generating  references  in  open 
domains  and  will  address  this  issue  in  future  work. 
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